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Abstract 

There has been much interest in the nonparametric testing of conditional indepen- 
dence in the econometric and statistical literature, but the simplest and potentially 
most useful method, based on the sample partial correlation, seems to have been over- 
looked, its distribution only having been investigated in some simple parametric in- 
stances. The present note shows that an easy to apply permutation test based on the 
sample partial correlation with nonparametrically estimated marginal regressions has 
good large and small sample properties. 

1 Introduction 

Various authors have developed tests of conditional independence without assuming nor- 
mality of the variables. For example, Kendall (1942), Goodman (1959) and Gripenberg 
(1992) proposed partial versions of Kendall's tau. Recently, there has been a focus on 
incorporating modern nonparametric methods to test for conditional independence (Su & 
White, 2007, 2008; Song, 2009; Huang, 2010; Bouezmarni, Rombouts, & Taamouti, 2010). 
Conditional independence relations are the building blocks of graphical models, which can 
be used to investigate causal relations for economic and other data. Surprisingly however, 
even though the partial correlation is very well-known, little seems to be known about its 
sampling distribution unless very strong assumptions are made. The present note fills this 
gap in the literature and shows that tests based on the partial correlation are easy to apply 
while simulations indicate good small samples properties. 

Consider the random triple (X, Y, Z), with Y and Z real and X arbitrary, and suppose 
interest lies in the question whether Y and Z are conditionally independent given X, 
denoted YALZ\X. If 

Y = g{X)+e Y and Z = h(X)+e z (1) 
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for certain functions g and h and with E(ey\X = x) = E(ez\X = x) = for all x, the 
partial correlation coefficient is the correlation between the error terms, i.e., 



EeyEz 



Pyz.x = 



If YALZ\X, then 

E(e Y s z \X) = J E(e Y s z \X = x)dF x (x) = J E(e Y \X = x)E{e z \X = x)dF x {x) = 

Hence, conditional independence implies that the partial correlation equals zero. 

This paper considers a conditional independence test for a sample {(Xi,Yi, Zi)} of 
independent replications of (X, Y, Z), based on the sample partial correlation ryz.x- If 
g and h are known, conditional independence can easily be tested by a permutation test 
using ryz.x- In practice, however, g and h are unknown, and our main result, Theorem 1 in 
Section 2, shows that replacing the regression functions g and h by appropriate estimates 
does not affect the asymptotic distribution of ryz.x- The small sample distribution of 
the resulting estimator is typically analytically untractable, but the simulation study in 
Section 3, with n = 20 and n = 100 and using cubic spline smoothers for estimating g and 
h, shows close-to-nominal Type I error rates and little loss of power due to estimating the 
marginal regressions. 

2 Large sample distribution of sample partial correlation 
with estimated marginal regressions 

Assume (1) holds and suppose (X±, Yi,Z\), . . . , (X n ,Y n , Z n ) are independent replications 
of (X,Y,Z). Then with 



If g and h are known, conditional independence can be tested using the permutation test of 
independence for the {(ey t ,ez t )} based on ryz.x, and the power of such a test is evidently 
the same as for an unconditional test between directly observed ey and e Z - In practice, 
however, we need to estimate g and h, say by estimators g and h. This yields estimated 
errors 



e Yi =Yi- g(Xi) and e Zi = Z { - h(Xi) 



the sample partial correlation coefficient is 



(2) 




iy. =Y t - g(Xi) and i Zl = Z { - h(Xi) 
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and an estimated sample partial correlation coefficient 

ryz.x — i \o) 

Very little appears to have been published about the large or small sample distribution of 
fy^.x, except if (X,Y,Z) has a normal distribution; then g and h are linear, and with d 
the dimension of X and g and h the least squares estimators of the regression planes, 

y re _ 2 - d— TYZ - X (4) 
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has a t-distribution with n — 2 — d degrees of freedom if conditional independence holds 
(Kendall & Stuart, 1973, Section 27.22). The corresponding unconditional test statistic 
has a t-distribution with n — 2 degrees of freedom under independence, so there is a loss of 
d degrees of freedom, or d observations, due to the conditioning on X. The distribution of 
ryz.x based on linear regressions with nonnormality was studied by Steiger and Browne 
(1984) and Boik and Haaland (2006); the asymptotic distribution of (4) was shown to be 
standard normal under broad conditions if conditional independence is true. 

If, however, at least one of g or h is nonlinear, a test based on (4) with linear g and 
h plugged in can break down entirely (see Section 3). The main result of this paper is 
given by the following theorem, which shows that (2) and (3) have identical asymptotic 
behaviour for appropriately estimated regression curves. We will make use of the following 
assumptions: 

Al: E{e y ) < oo and E{e\) < oo 

A2: For some q±, q2 > and all x, n qi (g(x) —g(x)) = O p (l) and n q2 (h(x) — h{x)) = O p (l), 
with uniform convergence on any compact set. 

Theorem 1 Suppose Al and A 2 hold. Then 

n 1 ' 2 (ryz.x ~ PYZ.x) = n l l 2 (r YZ . x - p YZ .x) + O p (n~ min ^^)) 

Proof: By standard arguments, 

s Yi = e Yt (1 + O p [n^]) and e Zi = e Zi (1 + O p [n^}) , 

so 

i Yi i Zi =e Yi e Zi (l + O p [n-^^]) 

Hence, using Al and A2 and with cov YZ ,x = n" 1 J27=i E Yi e Zi and covy^.x = n _1 Y^7=l ^Y^Zi, 
n 1/2 (cdv YZ . x - cov YZ . x ) = O p [n~ mi ^^} 
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Figure 1: Randomly generated data with different noise-to-signal ratios A. Here, ctq = 1 
and so cr £ = A/o"o = A. 



The theorem then follows because 

/ n n \ ~ 1/2 



n 



i=l 



i=l 
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n 



■ 1 E4^" 1 E4, + ^- ,nM41,i,) ] 

-1/2 



-1/2 



n 



i=l 
n 



i=i 



i=i 



□ 

Nonparametric estimators of g and h are local polynomial or cubic spline estimators (see, 
e.g., Wasserman, 2006). 

To summarize, Theorem 1 justifies using the following procedure. First, estimate g 
and h from the sample, then plug the estimates into ryz.x to obtain ryz.x, an d finally 
calculate a p- value for the conditional independence hypothesis using the permutation test 
applied to ryz.x- 
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3 Simulation study 



In our simulation study, the functions g and h were generated from an integrated Wiener 
process, i.e., 

rx rx 

g(x) = (To Wudt and h(x) = cr / Witdt (5) 
Jo Jo 

where Wu and W2t are independent Wiener processes. Note that, with probability one, 
g and h are once differentiable. The error pairs {{ey (x), £z(x)} were generated from a 
bivariate normal distribution with correlation pyz.x and equal marginal standard devia- 
tions, the common value denoted a E . A noise-to-signal ratio can be defined as A = (t £ /<jq. 
In Figure 1, randomly generated (centered) curves and data are plotted with <tq = 1 and 
A e {0.1,. 0.3, 0.5, 0.7}. 

We first simulated data according to the above model and fitted linear regression curves 
using the least squares method, which is the standard (and in this case, wrong) method for 
calculating partial correlation. Conditional independence was then tested using the per- 
mutation test for the estimated partial correlation. This procedure broke down completely, 
for example, with n = 100, a = 0.05 and A = 0.5, we found the type I error probability to 
be 0.71. 

We next simulated data again according to the above model (5) with normal errors but 
now estimated g and h according to the same, and thus correct, model. Such estimators 
are cubic splines (Green &; Silverman, 1994; Wahba, 1990). Only the estimates for the 
data points are required, which are given by 

g = H 2 [H 2 + Ajl) ~* y and h(z) = H 2 [h 2 + \ 2 z l) _1 y 

where 

\ = Ve,y/&Q,y and \z = &e,z/vQ,z 

and H 2 is the covariance matrix for the integrated Wiener process (e.g., Green & Silverman, 
1994). For gq and a e we used the true values by which the data were generated. We then 
did a permutation test for independence on the {(sV^^x)} using the estimated partial 
correlation ryz.x- 

Figure 2 shows power and Type I error rates and loss of power due to conditioning for 
several values of the partial correlation (based on 50,000 replications). It can be seen that 
for n = 20 and n = 100, unless the noise-to-signal ratio is very small, Type I errors error 
rates are close to nominal and there is very little loss of power due to conditioning. The 
method breaks down when the error standard deviation becomes very small, which is to 
be expected as it leads to strong overfitting. 

Figure 3, based on 500,000 replications, shows the effect of over- or undersmoothing 
on the Type I error rate, and hence gives an indication of robustness. It is seen that 
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Figure 2: Power curves and Type I error rates using a = 0.05 for several values of the partial 
correlation p. The dotted lines give the probabilities Hq is rejected for the corresponding 
test of marginal independence and give an asymptote for the curves. 
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Figure 3: The effect of undersmoothing (A < 0.5) and oversmoothing (A > 0.5) on Type I 
error rates for a = 0.05, with data generated using A = 0.5. The plot indicates strong 
robustness against oversmoothing up to a factor three for A (0.5 < A < 1.5). Severe under- 
or oversmoothing leads to breakdown of type I error rates. 

undersmoothing has a negative effect on the Type I error rate, while oversmoothing can 
be done by a factor of about three for A without negatively affecting the error rate. (In 
fact, some oversmoothing is has a positive effect on the error rate.) Note, however, that 
for n = 100, undersmoothing by a factor as large as three for A still gives a Type I error 
rate of 6% (a = 5%), which should still be acceptable for most practical purposes. 

4 Conclusions 

Theorem 1 proves that, for sufficiently large samples, the partial correlation coefficient with 
appropriately estimated marginal regressions can be used in a very simple way, using the 
permutation test, to test for conditional independence. Simulation studies show that this 
method also works well for samples as small as n = 20, and with little loss of efficiency; this 
is in line with what was known to be the case for normal distributions. Our method is very 
robust to oversmoothing of the marginal regressions, but severe under- or oversmoothing 
leads to a breakdown of Type I error rates. 
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